Covid & Economics Case for Rich & Poor Countries. Advanced Visualizations in R.

1. Introduction and Objectives

In this project we will look at the Covid data from a different perspective. We will analyze data with regards to world economy. We will look at different countries economic parameters and derive some insights. Basically, we would like to answerd questions such as: which countries have more cases poor countries vs rich countries. We will also group countries by continent and look for patterns they may have. Also, we will show visualization that will tell story about covid and economy relationship. In the second part of the analysis, we will introduce foreign exchange rates and we will at it to derive insights. Our hypothesis is that, those countries who have more covid cases, their exchange rate should fall down and have some devaluation problems. We will also observe data for different continents and groups.

2. Data Source

The data is located on GitHub and it is updated constantly by John Hopkins University. The data is directly sources from GitHub and we use live connecion. https://github.com/owid/covid-19-data/tree/master/public/data

Information about Confirmed cases and deaths comes from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU).

Information about Hospitalizations and intensive care unit (ICU) admissions comes from the European Centre for Disease Prevention and Control (ECDC) for a select number of European countries; the government of the United Kingdom; the COVID Tracking Project for the United States; the COVID-19 Tracker for Canada.

Information about Testing for COVID-19 is collected by the Our World in Data team from official reports;

Information about Vaccinations against COVID-19 is collected by the Our World in Data team from official reports.

The benefit of data is that it is constantly updated.

3. Data Description

Data contains 67,439 rows and 59 variables. Data is increasing since an observation is about one country data per day. Therefore, each day the data points are increasing.

Rows: 61,486
Columns: 55
$ iso_code                              <chr> "AFG", "AFG", "AFG", "AFG", "...
$ continent                             <chr> "Asia", "Asia", "Asia", "Asia...
$ location                              <chr> "Afghanistan", "Afghanistan",...
$ date                                  <date> 2020-02-24, 2020-02-25, 2020...
$ total_cases                           <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 2, 4,...
$ new_cases                             <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 1, 2,...
$ new_cases_smoothed                    <dbl> NA, NA, NA, NA, NA, 0.143, 0....
$ total_deaths                          <dbl> NA, NA, NA, NA, NA, NA, NA, N...
$ new_deaths                            <dbl> NA, NA, NA, NA, NA, NA, NA, N...
$ new_deaths_smoothed                   <dbl> NA, NA, NA, NA, NA, 0, 0, 0, ...
$ total_cases_per_million               <dbl> 0.026, 0.026, 0.026, 0.026, 0...
$ new_cases_per_million                 <dbl> 0.026, 0.000, 0.000, 0.000, 0...
$ new_cases_smoothed_per_million        <dbl> NA, NA, NA, NA, NA, 0.004, 0....
$ total_deaths_per_million              <dbl> NA, NA, NA, NA, NA, NA, NA, N...
$ new_deaths_per_million                <dbl> NA, NA, NA, NA, NA, NA, NA, N...
$ new_deaths_smoothed_per_million       <dbl> NA, NA, NA, NA, NA, 0, 0, 0, ...
$ reproduction_rate                     <dbl> NA, NA, NA, NA, NA, NA, NA, N...
$ icu_patients                          <lgl> NA, NA, NA, NA, NA, NA, NA, N...
$ icu_patients_per_million              <lgl> NA, NA, NA, NA, NA, NA, NA, N...
$ hosp_patients                         <lgl> NA, NA, NA, NA, NA, NA, NA, N...
$ hosp_patients_per_million             <lgl> NA, NA, NA, NA, NA, NA, NA, N...
$ weekly_icu_admissions                 <lgl> NA, NA, NA, NA, NA, NA, NA, N...
$ weekly_icu_admissions_per_million     <lgl> NA, NA, NA, NA, NA, NA, NA, N...
$ weekly_hosp_admissions                <lgl> NA, NA, NA, NA, NA, NA, NA, N...
$ weekly_hosp_admissions_per_million    <lgl> NA, NA, NA, NA, NA, NA, NA, N...
$ total_tests                           <lgl> NA, NA, NA, NA, NA, NA, NA, N...
$ new_tests                             <lgl> NA, NA, NA, NA, NA, NA, NA, N...
$ total_tests_per_thousand              <lgl> NA, NA, NA, NA, NA, NA, NA, N...
$ new_tests_per_thousand                <lgl> NA, NA, NA, NA, NA, NA, NA, N...
$ new_tests_smoothed                    <lgl> NA, NA, NA, NA, NA, NA, NA, N...
$ new_tests_smoothed_per_thousand       <lgl> NA, NA, NA, NA, NA, NA, NA, N...
$ positive_rate                         <lgl> NA, NA, NA, NA, NA, NA, NA, N...
$ tests_per_case                        <lgl> NA, NA, NA, NA, NA, NA, NA, N...
$ tests_units                           <lgl> NA, NA, NA, NA, NA, NA, NA, N...
$ total_vaccinations                    <lgl> NA, NA, NA, NA, NA, NA, NA, N...
$ new_vaccinations                      <lgl> NA, NA, NA, NA, NA, NA, NA, N...
$ new_vaccinations_smoothed             <lgl> NA, NA, NA, NA, NA, NA, NA, N...
$ total_vaccinations_per_hundred        <lgl> NA, NA, NA, NA, NA, NA, NA, N...
$ new_vaccinations_smoothed_per_million <lgl> NA, NA, NA, NA, NA, NA, NA, N...
$ stringency_index                      <dbl> 8.33, 8.33, 8.33, 8.33, 8.33,...
$ population                            <dbl> 38928341, 38928341, 38928341,...
$ population_density                    <dbl> 54.422, 54.422, 54.422, 54.42...
$ median_age                            <dbl> 18.6, 18.6, 18.6, 18.6, 18.6,...
$ aged_65_older                         <dbl> 2.581, 2.581, 2.581, 2.581, 2...
$ aged_70_older                         <dbl> 1.337, 1.337, 1.337, 1.337, 1...
$ gdp_per_capita                        <dbl> 1803.987, 1803.987, 1803.987,...
$ extreme_poverty                       <dbl> NA, NA, NA, NA, NA, NA, NA, N...
$ cardiovasc_death_rate                 <dbl> 597.029, 597.029, 597.029, 59...
$ diabetes_prevalence                   <dbl> 9.59, 9.59, 9.59, 9.59, 9.59,...
$ female_smokers                        <dbl> NA, NA, NA, NA, NA, NA, NA, N...
$ male_smokers                          <dbl> NA, NA, NA, NA, NA, NA, NA, N...
$ handwashing_facilities                <dbl> 37.746, 37.746, 37.746, 37.74...
$ hospital_beds_per_thousand            <dbl> 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,...
$ life_expectancy                       <dbl> 64.83, 64.83, 64.83, 64.83, 6...
$ human_development_index               <dbl> 0.498, 0.498, 0.498, 0.498, 0...

The datatable sample is shown below.

iso_code continent location date total_cases new_cases new_cases_smoothed total_deaths new_deaths new_deaths_smoothed total_cases_per_million new_cases_per_million new_cases_smoothed_per_million total_deaths_per_million new_deaths_per_million new_deaths_smoothed_per_million reproduction_rate icu_patients icu_patients_per_million hosp_patients hosp_patients_per_million weekly_icu_admissions weekly_icu_admissions_per_million weekly_hosp_admissions weekly_hosp_admissions_per_million total_tests new_tests total_tests_per_thousand new_tests_per_thousand new_tests_smoothed new_tests_smoothed_per_thousand positive_rate tests_per_case tests_units total_vaccinations new_vaccinations new_vaccinations_smoothed total_vaccinations_per_hundred new_vaccinations_smoothed_per_million stringency_index population population_density median_age aged_65_older aged_70_older gdp_per_capita extreme_poverty cardiovasc_death_rate diabetes_prevalence female_smokers male_smokers handwashing_facilities hospital_beds_per_thousand life_expectancy human_development_index
AFG Asia Afghanistan 2020-02-24 1 1 NA NA NA NA 0.026 0.026 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 8.33 38928341 54.422 18.6 2.581 1.337 1803.987 NA 597.029 9.59 NA NA 37.746 0.5 64.83 0.498
AFG Asia Afghanistan 2020-02-25 1 0 NA NA NA NA 0.026 0.000 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 8.33 38928341 54.422 18.6 2.581 1.337 1803.987 NA 597.029 9.59 NA NA 37.746 0.5 64.83 0.498
AFG Asia Afghanistan 2020-02-26 1 0 NA NA NA NA 0.026 0.000 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 8.33 38928341 54.422 18.6 2.581 1.337 1803.987 NA 597.029 9.59 NA NA 37.746 0.5 64.83 0.498
AFG Asia Afghanistan 2020-02-27 1 0 NA NA NA NA 0.026 0.000 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 8.33 38928341 54.422 18.6 2.581 1.337 1803.987 NA 597.029 9.59 NA NA 37.746 0.5 64.83 0.498
AFG Asia Afghanistan 2020-02-28 1 0 NA NA NA NA 0.026 0.000 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 8.33 38928341 54.422 18.6 2.581 1.337 1803.987 NA 597.029 9.59 NA NA 37.746 0.5 64.83 0.498

4. Aggregating Data

We will select only important colums such as location, continent, gdp_per_capita, population_density, total_cases, total_cases_per_million, population. This is to use wisely our computation memory and derive useful insights in our graphs. Next step is to remove World and International from the dataset since we will concentrace on countries and those files will be outliers in the graphs.

location continent total_cases gdp_per_capita population_density total_cases_per_million population
Afghanistan Asia 54403 1803.987 54.422 1397.517 38928341
Albania Europe 69916 11803.431 104.871 24294.948 2877800
Algeria Africa 104852 13913.839 17.348 2391.095 43851043
Andorra Europe 9379 0.000 163.755 121387.433 77265
Angola Africa 19177 5819.495 23.890 583.486 32866268
Antigua and Barbuda North America 192 21490.943 231.845 1960.624 97928
Argentina South America 1843077 18933.907 16.177 40779.850 45195777
Armenia Asia 165528 8787.580 102.931 55860.590 2963234
Australia Oceania 28755 44648.710 3.202 1127.652 25499881
Austria Europe 399798 45436.686 106.749 44390.433 9006400

6. Grouping Countries by Income

Let’s group coutnries in three countries and observe how many countries we have in each group. We will use GDP_per_capita to define the groups rich, medium and poort countries. Countries between 0-7.000 USD GDP_per_capita we call this group “poor”. The countries with GDP_per_capita between 7.000-20.000 USD, we call them “medium” income countries. And the thirds group is countries with GDP_per_capita more than 20.000 USD.

Looking at the graph we see that 40 countries in Africa are in poor countries categories. We definitely do not want the world to have that many poor countries and especially in one continent. The Europe is the continent with many rich countries and only one poor country which is Moldova. North and South America have many Medium income countries. However, our desire is to have evenly distributed wealth in every continent and countries.

7. Rich vs Poor Countries & Covid Cases

Looking at the below graph we want to answer question, Do rich countries have more covid cases? Graph answers that questions as this is true assumptions - rich countries have more covid cases. We think this is due to the fact the in rich countries population is tested massively and also statistic is kept more accurate to count cases than comparing with poor countries. The poor countries do not count cases successfully and even more critical is that they do not test the population as the test are not available for most of them.

8. Heatmap of I and II Wave of Covid

Below heatmap shows monthly new covid cases for European countries. The Heatmap gives us good demonstration when we have first and the second waves of the corona virus cases spikes. The colors are based on the covid cases. The dark red is a lot of cases while yeallow whiteish is less cases. In the first wave Spain, Italy, Germany and France has a lot of case during March and April in 2020. While during the second wave we see a lot of redish cells during October 2020 and January 2021, especially in the countries such as Spain, Italy, Poland, Germany, France, Belgium and the Netherlands.

library(lubridate)
library(dplyr)

EU <- c("Austria", "Belgium", "Bulgaria", "Croatia", "Cyprus", "Czech Republic", 
    "Denmark", "Estonia", "Finland", "France", "Germany", "Greece", "Hungary", "Ireland", 
    "Italy", "Latvia", "Lithuania", "Luxembourg", "Malta", "Netherlands", "Poland", 
    "Portugal", "Romania", "Slovakia", "Slovenia", "Spain", "Sweden")


data_for_month <- data.frame(data) %>% dplyr::select(location, date, new_cases) %>% 
    filter(location %in% EU)
data <- data.frame(data)

datats <- data.frame(data) %>% dplyr::select(continent, date, total_cases)


data_for_month$month <- floor_date(data_for_month$date, "month")

data_for_month <- as.tibble(data_for_month) %>% group_by(location, month) %>% summarize(new_cases = as.numeric(sum(new_cases)))

data_for_month$month <- as.character(data_for_month$month)
data_for_month$new_cases_1000 <- as.numeric(data_for_month$new_cases/1000)
ggplot(data_for_month, aes(month, location)) + geom_tile(aes(fill = new_cases_1000), 
    colour = "white") + labs(title = "Which month had the biggest amount of new cases?", 
    subtitle = "First wave in Mar 2020 and the second wave in Sep-Oct 2020!", caption = "Date: 2021-Jan-29") + 
    scale_x_discrete("", expand = c(0, 0)) + scale_y_discrete("", expand = c(0, 0)) + 
    scale_fill_gradient2(name = "New Cases 000'", low = "#006400", mid = "#f2f6c3", 
        high = "#cd0000", midpoint = 0.5, na.value = "white") + theme(legend.position = "right", 
    axis.ticks = element_blank(), axis.text.x = element_text(angle = 90, hjust = 0.5), 
    axis.text.y = element_text(size = 10), panel.background = element_blank(), plot.title = element_text(size = 16, 
        face = "bold", color = "#0B8389"), plot.subtitle = element_text(size = 11))

9. Top 15 Countries by Covid Cases

Let’s observe top 15 countries to see if covid cases are concentrated on one continent or it is spread evenly. Looking at the chart of top 15 countries we can see that only one country is from Africa, country - South Africa. This tells us again a story that rich countries are able to test and count the covid cases well. South Africa is richest country in Africa.

The majory of countries in Top 15 are located in Europe. 7 countries out of 15 are european countries.

Two countries are from North America - United States and Mexico. While South America is represented by three countries, they obviously have the biggest population in the continet. They are Brazil, Colombia and Argentina.

One country from Middle East - Turkey and one country from Asia - India.

10. Covid in Rich Countries

The special scatterplot below is dedicated to answer one question - do mostly rich countries getting infected with virus? With one look the answer is positive, yes, the more GDP_per_capita the more total_cases_per_million. For example, Luxembourg, Qatar and Singapore are top three countries with highest GDP_per_capita, however, they are also on the top of the total_cases_per_million.

Green balls on the top-right represent European countries and they are certainly on the top of the scatter plot by both metrics GDP_per_capita and total_cases_per_million. Contrary, African countries in red are way below in both metrics.

11. Distribution of Wealth by Continents

Does equlity exist between countries in different continents? Let’s study the below histograms. The Europe is more or less normally distributed which is desirable for all other continents, but are the same distribution of GDP_per_capita on different continents?

Africa is mostly on the left side, many poor countries just with small amoount of GDP_per_capita.

Asia, less but North America and South America is more or less normally distributed, however, North America and Asia have outlires such as the United States, Canada, Singapore and Hong Kong.

continent location date new_cases new_deaths total_cases total_deaths total_cases_per_million population
Asia Afghanistan 2020-02-24 1 NA 1 NA 0.026 38928341
Asia Afghanistan 2020-02-25 0 NA 1 NA 0.026 38928341
Asia Afghanistan 2020-02-26 0 NA 1 NA 0.026 38928341
Asia Afghanistan 2020-02-27 0 NA 1 NA 0.026 38928341
Asia Afghanistan 2020-02-28 0 NA 1 NA 0.026 38928341

12. Development of Covid Cases

The racing chart below shows how covid cases were increasing in different continets from the start until now.

The figures show total deaths on the left side and total cases on the right side.

In Asia, India started quite late but it is leading now with the number of covid cases.

In Europe, countries start ver quick on covid cases but fortunately do not climb up on the total deaths axis.

Unfortunately, in US the total deaths side is climbed by the United States and in South America Brazil is following the US trend.

13. World Map

The map represents countries coloured by GDP_per_capita. Darker the color bigger the GDP_per_capita. Th North America, Europe and Australia is mostly dark coloured since their high GDP_per_capita.

Map is interactive and when you hover the country it displays the information about GDP_per_capita, Total cases, Cases per million, population.

        Date Currency Exchange_rate
1 2021-01-21      USD        1.2158
2 2021-01-20      USD        1.2101
3 2021-01-19      USD        1.2132
4 2021-01-18      USD        1.2064
5 2021-01-15      USD        1.2123
6 2021-01-14      USD        1.2124
# A tibble: 6 x 4
# Groups:   date [6]
  date       continent Mean_ex     cases_pm
  <date>     <chr>       <dbl>        <dbl>
1 2020-01-23 Asia         7.69 0.0000000660
2 2020-01-24 Asia         7.65 0.000000192 
3 2020-01-27 Asia         7.65 0.000000557 
4 2020-01-28 Asia         7.63 0.00000183  
5 2020-01-29 Asia         7.63 0.000000402 
6 2020-01-30 Asia         7.65 0.00000143  

d3tree3
     Code       date location continent total_cases_per_million population gdp_per_capita stringency_index median_age human_development_index CountryCode Currency Exchange_rate
3323  BGN 2020-03-09 Bulgaria    Europe                   0.576    6948445       18563.31            21.30       44.7                   0.813          BG      Lev        1.9558
3324  BGN 2020-03-10 Bulgaria    Europe                   0.576    6948445       18563.31            21.30       44.7                   0.813          BG      Lev        1.9558
3325  BGN 2020-03-11 Bulgaria    Europe                   1.007    6948445       18563.31            26.85       44.7                   0.813          BG      Lev        1.9558
3326  BGN 2020-03-12 Bulgaria    Europe                   1.007    6948445       18563.31            26.85       44.7                   0.813          BG      Lev        1.9558
3327  BGN 2020-03-13 Bulgaria    Europe                   3.310    6948445       18563.31            50.93       44.7                   0.813          BG      Lev        1.9558
 [ reached 'max' / getOption("max.print") -- omitted 1 rows ]
# A tibble: 30 x 10
# Groups:   continent [1]
   continent population gdp_per_capita stringency_index median_age human_development_index Currency Exchange_rate  Norm Norm_cases
   <chr>          <dbl>          <dbl>            <dbl>      <dbl>                   <dbl> <chr>            <dbl> <dbl>      <dbl>
 1 Europe       6948445         18563.             21.3       44.7                   0.813 Lev               1.96     1  0.0000189
 2 Europe       6948445         18563.             21.3       44.7                   0.813 Lev               1.96     1  0.0000189
 3 Europe       6948445         18563.             26.8       44.7                   0.813 Lev               1.96     1  0.0000331
 4 Europe       6948445         18563.             26.8       44.7                   0.813 Lev               1.96     1  0.0000331
 5 Europe       6948445         18563.             50.9       44.7                   0.813 Lev               1.96     1  0.000109 
 6 Europe       6948445         18563.             50.9       44.7                   0.813 Lev               1.96     1  0.000246 
 7 Europe       6948445         18563.             56.5       44.7                   0.813 Lev               1.96     1  0.000317 
 8 Europe       6948445         18563.             70.4       44.7                   0.813 Lev               1.96     1  0.000435 
 9 Europe       6948445         18563.             70.4       44.7                   0.813 Lev               1.96     1  0.000444 
10 Europe       6948445         18563.             70.4       44.7                   0.813 Lev               1.96     1  0.000600 
# ... with 20 more rows
MODEL INFO:
Observations: 6360
Dependent Variable: Norm
Type: OLS linear regression 

MODEL FIT:
F(6,6353) = 176.72, p = 0.00
R² = 0.14
Adj. R² = 0.14 

Standard errors: OLS
-----------------------------------------------------------
                                Est.   S.E.   t val.      p
---------------------------- ------- ------ -------- ------
(Intercept)                     0.91   0.00   313.03   0.00
Norm_cases                      0.03   0.00    19.12   0.00
continentAsia                   0.02   0.00     5.78   0.00
continentEurope                 0.04   0.00    14.15   0.00
continentNorth America          0.02   0.00     6.35   0.00
continentOceania               -0.00   0.00    -1.01   0.31
continentSouth America         -0.02   0.00    -5.02   0.00
-----------------------------------------------------------

Conclusion

In this project we analyzed different datasets such as Covid Cases by country and date, economic parameters by country, foreign exchange rates for different countries. We specifically observed rich countries vs poor countries and how the covid developed in each group of countries. We also look at the data by different continents to see the patterns.

We used the advanced visualizations to see the trends, histograms, scatterplots, animated graps to see the insights better than we could derive from the dry tables with only numbers. After running visualizations it is obvious that patterns are much better visible than what we could see without visualizations.

One of the clear insights after analysis is that countries from rich group have more covid cases, this can be explained by different factors: 1. They are able to run testing for their population. 2. They are capable to count the covid cases. Contrary, countries from poor countries are not able to count covid cases and do not have fund to provide free or organized tests for the population.

Outliers are in each continent or region, South Africa in Africa, India in Asia, Turkey in Middle East, United States in North America and Brazil in South America.

While deaths cases are high in North America, South America and Asia, in Europe it is rather steady on lower level. This is point out the high quality health system that European countries have comparing with other countries outside of Europe.

References

  1. Codes and materials provided by the Mgr Piotr Ćwiakowski during the Graduate course in Advanced Visualizations in R, University of Warsaw, Faculty of Economic Sciences
  2. Antoine Soetewey, COVID-19 in Belgium: is it over yet? https://statsandr.com/blog/covid-19-in-belgium-is-it-over-yet/
  3. Data source: https://github.com/owid/covid-19-data/tree/master/public/data
  4. Coronavirus COVID-19 outbreak statistics and forecast v0.86 http://www.bcloud.org/e/
  5. The Top 74 Ggplot2 Open Source Projects https://awesomeopensource.com/projects/ggplot2
  6. gganimate https://github.com/thomasp85/gganimate